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I.  INTRODUCTION  §  SUMMARY 


1 . 1  INTRODUCTION 


Systematic  procedures  for  the  identification  of  dynamic 
systems  have  been  developed  over  the  last  two  decades.  These 
methods  have  been  successfully  applied  to  a  variety  of  vehicles 
and  other  systems.  Most  of  the  methods  are  based  on  several 
significant  assumptions. 

(1)  Models  used  in  parameter  estimation  step  are  correct. 

(2)  The  state  and  measurement  noise  follows  a  Gaussian 
distribution  (this  assumption  is  made  both  in  model 
structure  determination  and  parameter  identification). 

(3)  The  noise  sources  are  white  or  have  a  known  rational 
spectrum. 

(4)  All  unknown  parameters  about  which  there  is  information 
in  the  data  are  identified. 

(5)  Sufficient  data  is  available  such  that  asymptotic 
estimator  properties  are  valid. 

Real  data  often  do  not  follow  these  assumptions  leading 
to  an  inefficient  estimator.  The  following  symptoms  which  indicate 
a  lack  of  estimator  efficiency  have  been  observed. 

(1)  Residuals  are  non-white  and  non-Gaussian. 

(2)  The  actual  estimation  errors  are  much  higher  than 
those  Dredicted  by  statistical  analysis  (.Cramer-Rao 
bounds J . 

(3)  The  estimation  errors  are  unacceptable  when  too  many 
parameters  are  estimated. 

(4)  The  parameter  estimates  often  have  smaller  error  when 
zero  state  noise  is  used  compared  to  the  estimates 
when  true  state  noise  is  used. 

(5)  It  is  extremely  difficult  to  get  good  results  from 
short  data  records. 


System  identification  methods  are  at  a  stage  where  the 
issues  described  above  need  to  be  attacked.  This  report  formulates 
procedures  to  treat  non-Gaussian ,  non-white  noise  statistics 
in  order  to  develop  systematic  algorithms  and  an  interpretative 
framework  for  treating  actual  data. 


1 . 2  RESULTS 


The  investigation  into  nonconventional  noise  sources  has 
been  divided  into  two  parts.  The  first  part  develops  parameter 
estimation  methods  with  non-Gaussian  noise  in  state  and  measurement 
equations.  The  following  is  a  summary  of  significant  results 
of  this  work. 

(1)  Heavy  tailed  distributions  can  markedly  degrade  estima¬ 
tion  accuracy.  Thin  tailed  or  amplitude  limited  noise 
has  minor  influence  on  estimation  error. 

(2)  A  simple  rejection  of  outliers  approach  is  both  statis¬ 
tically  inefficient  and  computationally  undesirable. 

(3)  Advanced  methods  are  developed  that  lead  to  improvements 
in  both  state  and  parameter  estimation  accuracy. 

(Thus,  a  robust  Kalman  filter  is  a  byproduct). 

(4)  A  simple  example  demonstrates  that  parameter  estimation 
errors  can  be  reduced  by  a  factor  of  three  by  using 
robust  estimation.  A  larger  improvement  is  expected 

in  parameters  which  are  only  marginally  identifiable. 

The  second  part  of  the  work  studies  non-white  noise.  The 
following  results  have  been  obtained. 

(1)  The  non-white  noise  does  not  cause  a  bias  in  estimates. 

(2)  Low  frequency  noise  usually  increases  estimation  error 
while  the  high  frequency  noise  decreases  it.  Unfortu¬ 
nately,  most  systems  have  low  frequency  noise. 

(3)  The  non-white  noise  could  be  corrected  by  building 

an  appropriate  filter  or  a  whiteness  insensitive  esti¬ 
mator  can  be  designed. 

(4)  Cramer-Rao  bounds  based  on  white  noise  assumptions 
are  significantly  different  than  if  a  colored  noise 
assumption  is  used.  Since  most  current  analyses  are 


based  on  the  the  white  noise  assumption,  Cramer-Rao 
bounds  have  been  a  poor  measure  of  estimation  errors. 

The  noise  spectrum  should  be  estimated  and  the  correct 
spectrum  should  be  used  in  deriving  estimates  of  errors. 

(S)  When  the  noise  spectrum  is  nonrational  (not  a  ratio 
of  polynomials),  optimal  parameter  estimators  are 
difficult.  It  is  often  reasonable  to  approximate 
the  spectrum  by  a  ratio  of  polynomials. 

1 . 3  SUMMARY 


Section  II  of  this  report  describes  parameter  estimation 
problems  associated  with  non-Gaussian  noise.  This  is  followed 
by  the  discussion  of  non-white  noise  in  Section  III.  Finally, 
conclusions  and  areas  of  future  investigation  are  given  in  Section  IV. 
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II.  PARAMETER  ESTIMATION  IN  THE  PRESENCE  OF 
NON-GAUSSIAN  NOISE 


2.1  INTRODUCTION 

Measurement  errors  are  the  sum  of  inaccuracies  from  a  number 
of  sources.  These  errors  can  be  divided  into  two  broad  classes: 

(1)  systematic  errors,  and  (2)  random  errors.  Each  error  follows 
some  probability  distribution  but  is  otherwise  unpredictable. 

The  systematic  errors  are  identified  during  the  measurement 
system  calibration  tests.  During  the  parameter  estimation  stage, 
these  errors  are  set  to  test  values  or  are  jointly  estimated 
with  states/parameters  of  interest.  Since  random  errors  change 
with  time  in  an  unpredictable  manner,  their  effect  is  minimized 
by  the  use  of  a  filter. 

For  measurements  z,  dependent  on  parameters  e,  the 

/v 

most  likely  values  8  of  the  parameters  8  can  be  determined 
by  least  squares,  minimum  variance,  or  maximum  likelihood  methods. 
All  these  methods  assume  that  noise  probability  distributions 
are  known  and  all  errors  follow  the  assumed  probability  distribu¬ 
tions.  The  Gaussian  assumption,  for  example,  leads  to  the  least- 
squares  solution.  Approximations  are  required  to  provide  practical 
solutions  in  estimation  problems.  Failures  of  components  in 
instrument  systems,  local  inaccuracies,  sudden  environmental 
changes,  and  the  occurrence  of  gross  errors  are  normally  not 
considered  in  the  assumed  probability  distribution  of  the  measure¬ 
ment  system  noise  parameters.  Because  of  largely  increased 
complexity  in  modern  sensor  systems,  however,  these  errors  have 
become  increasingly  more  important  and  define  the  need  for  estima¬ 
tion  procedures  that  are  not  very  sensitive  to  departures  from 
the  assumptions  on  which  they  depend,  robust  estimation. 


2.2  TREATMENT  OF  NON- GAUSSIAN  NOISE 


To  treat  the  subject  of  non-Gaussian  noise  sources,  two 
cases  are  considered:  (1)  the  noise  distribution  is  known, 
and  (2)  the  noise  distribution  is  not  known. 

When  the  noise  distribution  is  known,  maximum  likelihood, 
minimum  variance  or  Bayes'  approaches  may  be  used  for  parameter 
estimation.  Such  approaches  give  nonlinear  estimators  even 
for  linear  systems.  In  general,  optimal  estimators  for  nonlinear 
systems  are  complex.  To  illustrate  the  likelihood  method,  Appen¬ 
dix  A  derives  a  maximum  likelihood  parameter  estimation  algorithm 
for  Poisson  noise  distribution. 

When  the  noise  distribution  is  not  known,  two  approaches 
may  be  used  for  estimation  and  identification. 

(1)  In  the  first  approach,  noise  probability  density  functions 

are  estimated.  These  probability  density  functions 

are  used  to  derive  optimal  estimators. 

(2)  In  the  second  approach,  robust  methods  are  used. 

These  methods  are  minimally  sensitive  to  distributions 

of  noise.  Some  efficiency  (e.g.  accuracy)  is  lost 

if  the  distribution  is,  in  fact,  Gaussian. 

The  rest  of  this  chapter  will  deal  with  robust  estimators. 

2.3  REVIEW  OF  PREVIOUS  WORK 

The  significance  of  the  Gaussian  assumption  has  been  known 
for  a  long  time.  Tukey  (1960)  and  Huber  (1972)  compared  variances 
of  parameter  estimates  computed  by  minimizing  mean  deviations 

d  =  arg  nun  (^  E|x.  -  x|)  (2.1) 

x  i  1 

to^  thos$  computed  hy  minimizing  mean  square  deviations 

s  *  arg  min  (I  E  (x.  -  x)2)^  (2.2) 

x  11  i  1 
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for  an  error  which  is  normally  distributed  but  is  contaminated 
by  another  normally  distributed  random  variable  whose  mean  square 
value  is  three  times  higher.  They  showed  that  a  contamination 
of  0.2%  suffices  to  make  the  asymptotic  efficiency  of  the  mean 
deviation  larger  than  the  asymptotic  efficiency  of  the  mean 
square  deviation  (Huber,  1977,  p.2).  Thus,  some  estimation 
methods  are  very  sensitive  to  deviations  from  the  assumed  distri¬ 
bution.  Note  that  the  mean  deviations  method  is  not  the  best 
for  mixtures  of  distributions  or  distribution  uncertainties. 

Maximum  likelihood  methods,  which  fit  smooth  distribution  functions 
through  the  actual  error  statistics  (Hall/Gupta,  1974)  give 
generally  much  better  results. 

One  of  the  main  reasons  for  the  sensitivity  of  the  estimation 
methods  to  deviations  from  assumed  error  statistics  results 
from  their  extreme  behavior  in  the  distribution  tails  (Huber, 

1972;  Hampel,  1971).  Pierce  (1852),  Freedman  (1966),  and  others 
showed  that  engineering  and  scientific  data  have  typically  outliers 
of  several  percent  that  fatten  the  tails  of  the  distribution 
of  random  variables^  Test  data  of  submarines  and  aircraft  have 
typically  1%  to  5%  outliers.  Legendre  (1805)  suggested  robustify- 
ing  the  least-squares  estimator  by  rejecting  all  data  which 
have  obvious  errors  much  larger  than  the  remaining  data.  Airy 
(1856)  pointed  out  that  this  rejection  method  is  not  optimal, 
since  it  ignores  the  information  content  of  the  rejected  measure¬ 
ment.  During  the  last  100  years,  different  weighting  methods 
were  developed  that  reduce  the  sensitivity  of  the  least-squares 
estimator.  This  research  led  to  the  maximum  likelihood  method 
as  one  of  the  best  linear  estimators. 

Noether  (1967),  Birnbaum/Laska  (1967),  Htfyland  (1968)  show 
that  the  sample  mean  is  very  sensitive  to  grouping  effects  (e.g. 
tests  or  experiments  done  at  different  times  at  changed  environ¬ 
mental  conditions),  and  pairwise  median  estimators  reduce  the 
error  covariance  of  the  estimate  up  to  18%  in  the  presence  of 
gross  errors. 


H*iyland  (1968),  Gastwirth/Rubin  (1969),  Parks  (1967),  and 
Jain  (1975)  show  the  sensitivity  of  estimates  to  dependencies 
and  correlations  between  the  errors. 

2.4  CONFIDENCE  MAPPING  FOR  ROBUST  ESTIMATION  IN  STATIC  SYSTEMS 

In  the  following  sections,  the  method  of  confidence  mapping 
is  introduced  that  makes  the  estimates  less  sensitive  to  the 
extreme  tails  of  the  data,  with  only  a  small  sacrifice  in  optimal¬ 
ity  of  the  estimate.  This  method  assumes  that  only  statistics 
that  belong  to  the  open  set,  bounded  by  the  confidence  boundaries, 
follow  the  assumed  error  distribution  function  and  errors  outside 
the  confidence  boundaries  belong  to  different  and  unknown  distribu¬ 
tion  functions.  Using  this  binary  order  statistic,  the  errors 
outside  the  confidence  boundaries  are  mapped  inside  the  boundaries 
by  a  confidence  function  and  then  treated  as  if  they  follow 
the  assumed  distribution  of  errors.  For  the  estimation  problem, 
the  confidence  boundaries  have  to  be  chosen  such  that  the  error 
inside  the  boundaries  are  sufficient  statistics  for  the  problem 
(generally  around  2a)  and  the  confidence  mapping  function  has 
to  be  chosen  such  that  it  does  not  violate  the  conditions  for 
global  optimality  (convexity  condition). 

2.4.1  Effect  of  Deviations  from  the  Assumed  Probability 
Distribution 


The  mean  of  a  random  variable  is 


=  /x  p(a,x)  dx  , 


and  its  covariance  is 


var  x  =  f x}  p(a,x)dx, 


(2.3) 


(2.4) 
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where  p(a,x)  is  the  probability  density  function  of  the  random 
variable  x,  and  "a"  is  a  parameter  defining  the  density  function 
from  a  particular  class. 

Deviations  from  the  assumed  distribution  function  (Prokhorov, 
1956)  occur  when  the  random  variables  temporarily  belong  to 
a  different  distribution  function.  To  determine  the  effect 
of  deviations  from  the  assumed  distribution  function  we  find 
the  sensitivity  of  incremental  contributions  to  mean  and  covariance 
due  to  distribution  parameter  changes. 

The  incremental  contributions  of  particular  values  of  the 
random  variable,  x,  to  mean  and  covariance  are 

ux  =  x  p(a,x)  (2.5) 

vx  =  x2  p(a , x)  (2.6) 


and  the  sensitivities  to  distribution  parameter  changes  are 


u  =  x 

3a 

v  _  v-2  ap(a»*) 
x  “  3a 

a 


(2.7) 

(2.8) 


The  sensitivity  of  some  of  the  common  symmetric  density  functions 
are  shown  in  Table  2.1.  The  highest  sensitivity  to  deviations 
from  the  assumed  distribution  functions  have  normally  distributed 
random  variables.  The  sensitivity  is  small  for  random  variables 
less  than  2a  away  from  their  mean,  but  it  becomes  large  further 
away  from  the  mean.  The  sensitivity  of  the  mean  of  normally 
distributed  random  variables  increases  with  the  3rd  power  of 
x,  and  the  sensitivity  of  the  variance  with  the  4th  power. 

This  explains  the  high  sensitivity  of  the  least  squares  method 
to  deviations  from  the  assumed  probability  distribution  (Newcomb, 
1886;  Tukey ,  1960;  Huber,  1972). 
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Changes  in  Distribution  Parameters 


The  sensitivity  function  of  an  exponential  distribution 
function  increases  only  with  the  2nd  power  of  x  for  the  mean 
and  3rd  power  of  x  for  the  variance;  it  is  therefore  less 
sensitive  to  deviations  from  the  nominal  distribution  function 
for  large  values  of  x  or  outliers.  The  sensitivity  function 
of  a  uniform  distribution  increases  only  with  1st  power  of  x 
for  the  mean  and  2nd  power  of  x  for  the  variance.  It  is  there¬ 
fore  least  sensitive  to  distribution  uncertainties. 

The  sensitivity  functions  show  additionally  that  the  mean 
of  symmetrically  distributed  random  variables  is  not  changed 
by  corruption  from  other  symmetrically  distributed  random  variables, 
since  the  sensitivity  functions  for  the  mean  are  of  odd  powers 
of  x.  The  variance  and  the  uncertainty  in  an  estimate  are 
of  even  power  of  x  and  are  therefore  affected  by  errors  in 
distributions. 

We  also  observe  that  the  sensitivity  increases  with  the 
third  power  of  the  inverse  of  the  uncertainty  for  normally  distri¬ 
buted  random  variables.  This  is  of  particular  importance  for 
the  update  of  innovation  processes,  which  are  generally  weighted 
with  the  inverse  of  the  square  of  the  estimate  uncertainty. 

For  the  number  of  data  n-*-«  the  estimate  uncertainty  becomes 
very  small  and  therefore  the  ratio  x/a  becomes  very  large. 

Hence,  the  relative  sensitivity  to  distribution  uncertainties 
increases  with  the  number  of  data  points. 

In  summary,  the  more  concentrated  the  assumed  distribution 
is  at  its  mean  and  the  flatter  the  assumed  tail,  the  more  sensi¬ 
tive  the  estimates  are  to  deviations  from  the  assumed  distribution. 

In  order  to  robustify  an  estimator,  we  have  to  reduce  the 
sensitivity  of  the  estimator  to  the  tail  distribution  at  the 
cost  of  a  small  increase  in  the  variance  of  the  estimate  for 
nominal  distribution. 
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2.4.2  Common  Probability  for  Errors  Outside  and  Inside  the 
Confidence  Boundaries 

Errors  within  the  measurement  accuracy  of  a  sensor  system, 
e.g.  errors  within  our  confidence  boundaries,  can  often  be  assumed 
normal  or  Poisson-distributed.  Errors  outside  the  confidence 
boundaries  are  generally  due  to  failures,  partial  failures  of 
components,  and  signal  combinations  not  considered  in  the  design 
of  the  measuring  system.  These  errors  are  typically  exponentially 
distributed  or  do  not  belong  to  any  dense  set  of  values.  An 
estimator  based  on  the  exponential  closure  to  this  mixture  of 
errors  does  not  give  sufficient  weight  to  measurements  with 
small  errors.  This  estimator  will  be  very  robust  to  uncertainties 
in  the  distribution,  but  it  will  sacrifice  on  optimality,  i.e. 
the  uncertainty  of  the  estimate  will  be  unnecessarily  large. 

2.4.3  Rejection  of  Outliers 

Legendre  (1805)  and  Merrill  (1972)  treat  errors  outside 
some  confidence  boundaries  as  total  failures  of  the  measurement 
system  and  ignore  the  corresponding  measurements  for  estimation 
purposes.  Airy  (1856),  Ellis  (1844),  Fisher  (1926),  and  Huber 
(1964)  point  out  that  environmental  influences  and  partial  failures 
often  cause  outliers  which  contain  information  in  a  degraded 
but  not  lost  form.  Rejecting  these  measurements  eliminates 
the  corresponding  information  contents  and  hence  makes  the  estimator 
nonoptimal.  Additionally,  this  method  can  lead  to  estimator 
instabilities  for  errors  about  the  confidence  boundaries. 

De  Laplace  (1818)  and  Edgeworth  (1886)  showed  that  the 
median  is  the  optimal  estimator  when  errors  follow  no  particular 
distribution  and  can  be  assumed  to  be  uniformly  distributed 
in  a  non-dense  set  of  values.  This  is  a  non-parametr ic  estimator 
that  chooses  the  median  random  variable  in  an  ordered  set  as 
the  best  estimate  and  rejects  all  other  data.  Therefore,  its 
expected  accuracy  is  directly  proportional  to  the  density  of 

! 


the  measurements.  Edgeworth  (1886)  showed  also  that  the  median 
is  better  than  the  least  squares  estimator  for  mixtures  of  Gaussian 
distributed  errors. 


2.4.4  Confidence  Mapping 

Weighting  the  variables  with  some  confidence  measure  that 
reduces  the  incremental  influence  of  random  variables  from  the 
tails  of  a  distribution  will  robustify  the  estimator.  This 
confidence  mapping  function  (Salzwedel,  Gupta,  1979)  has  to 
be  chosen  in  such  a  way  that  it  robustifies  the  estimator  without 
destabilizing  it  for  particular  errors  or  groups  of  errors  and 
leaves  the  estimator  nearly  optimal  in  the  strict  parametric 
sense. 

Instead  of  using  the  estimate 
00 

E(x)  =  fx  p ( x )  dx,  (2.9) 

—  00 

where  p(x)  is  the  nominal  probability  density  function,  the 
estimate  has  now  the  form 

00 

E(x)  *  fx  c(x)  p ( x )  dx, 

—  00 

where  c(x)  is  a  confidence  mapping  function  of  the  form 
-/-I  _  probust^ 

cU) - pnrr- 

The  variance  is  then 

00 

var(x)  =  fx 2  c2(x)p(x)  dx  (2.12) 

-00 

The  rejection  method  of  Section  2.4.3  can  be  seen  as  a 
confidence  mapping  function  which  maps  the  measurement  into 
the  a  priori  estimate  and  hence  does  not  change  the  estimate. 


(2.10) 


(2.11) 


Winsorizing  maps  the  errors  outside  the  confidence  boundaries 
into  the  confidence  boundaries.  Huber's  M-estimator  uses  a 
straight  line  continuation  on  the  convex  maximum  likelihood 
function  inside  the  confidence  boundaries  to  reduce  the  destabiliz¬ 
ing  effect  of  outliers. 

The  estimation  problem  can  be  formulated  in  the  form 

E p(x. , 9)  =  min,  (2.13) 

i  1 

where  p(x,0)  is  some  arbitrary  function  and  9  is  the  optimal 
estimate  of  the  parameter  9. 

If  x.  belongs  to  a  dense  set,  R  ,  the  problem  can  be 

1  A 

stated  in  differentiated  form 

E  (Xj  ,  9)  =  0,  (2.14) 

where 

<1*  (x,  9)  =  P  (x,  9) 

(Note:  p(x,9)  =  -  log  f(x,9)  gives  an  ordinary  ML  estimate.) 

If  9  is  a  location  parameter,  the  problem  becomes 

A 

E  p (x. -  9)  =  rain 
i  1 
or 

E  >Kx.  -  9)  =  0. 
i  1 

Equation  (2.14)  can  be  written 

Ew.  (x.  -9)  =  0;  (2.17) 

i  1  1 


(2.15) 

(2.16) 


with 


Wi  a  x. 


»(Xj -  9) 
9 
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we  get  the  weighted  mean 


2.4.4. 1  Huber's  M-Estimator 

Discussing  Gauss's  arithmetic  mean  (solution  to  ^tx^  a)=0), 

Ellis  (1844)  introduced  a  function  <J>(.)  that  gives  different 
weight  to  measurements  further  away  from  the  mean, 

EiKxj-a)  =  0,  (2.19) 

and  brought  up  the  question  of  choosing  the  function  such  as 
to  obtain  a  stable  estimator.  Huber  (1964)  calls  this  estimator 
M-estimator  (maximum  likelihood  estimator)  and  modifies  the 
function  \ p  such  that  it  corresponds  to  the  ordinary  inverse- 
log  maximum  likelihood  function, 

p(x,e)  =  -  log  f(x,9)  (2.20) 

and  <Hx,e)  =  gQP(x,e),  for  random  variables  inside  the  confi¬ 
dence  boundaries  and  a  straight  line  continuation  of  p(x,0) 
outside  the  confidence  boundaries.  This  corresponds  to  a  normal 
distribution  function  inside  the  confidence  boundaries  and  an 
exponential  distribution  function  outside.  This  likelihood 
function  gives  minimum  weight  to  measures  outside  the  confidence 
boundaries  without  violating  the  convexity  condition  for  a  global 
optimal  estimate. 

For  normally  distributed  errors  inside  the  confidence  boun¬ 
daries  and  exponentially  errors  outside,  p»  is  of  the  form 


IS 


c  (x) 


X 

T 

C  |  X I 


2 


c 


T 


for  |  x  |  <_  c 
for  | x l  >  c 


(2.21) 


and 


*(x)  = 


-c  for  x  <  -c 


for  -c  <  x  <  c 


for  x  >  c 


The  estimator  has  thus  the  form 


(2.22) 


Ew-Cx.-Q)  =  0  (2.23) 

with  weights  w.  mapping  errors  outside  the  confidence  boundaries 
on  the  confidence  boundaries  (Winsorizing) ,  which  is  similar 
to  the  confidence  mapping  proposed  by  Newcomb  (1886). 

2.4.4. 2  Robust  Likelihood  Functions  in  the  Class  of  Linear 
Estimates 


The  condition  for  a  linear  estimator  is  that  the  likelihood 
function  has  no  power  higher  than  two.  The  condition  for  a 
unique  optimum  is  that  the  likelihood  function  is  convex.  These 
two  conditions  require  that  the  likelihood  function  has  the  form 

p(x)  *  ax2  +  b | x |  +  d.  (2.24) 

Any  robust  likelihood  function  must  therefore  be  a  linear  combina¬ 
tion  of  Eq.  (2.24)  (see  Figure  2.1). 
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2. 4. 4. 3  Robust  Likelihood  Estimates  in  a  Class  of 
Distribution  Functions 

In  Section  2. 4. 4.1,  we  discussed  Huber's  robust  M-estimator 
for  a  given  distribution  or  density  function.  In  the  following 
the  theory  is  extended  for  the  case  where  the  density  function 
is  unknown,  but  it  is  known  to  which  class  of  density  functions 
it  belongs, 

f  e  F,  (2.25) 

where  F  is  a  class  of  density  functions,  e.g.  symmetric  density 
functions,  and  f  is  a  density  function  out  of  F,  described 
by  the  parameter  4> , 

f  =  FU).  (2.26) 

The  robustified  likelihood  function  is  then 


o (x,e ,4) 


i  (f  (x,e  ,<j>))  for  | x - 9 1  <_  c 
r(f(x,e,<j>))  for  |  x  -  e  |  >  c 


(2.27) 


where  c  is  the  confidence  boundary,  l  is  a  likelihood  function 
for  innovations  x-e  inside  the  confidence  bounds  (e.g.  inverse- 
log  likelihood  function  for  purely  exponential  distributions 
f),  and  r  is  a  robustified  likelihood  function  outside  the 
confidence  boundaries,  that  reduces  the  sensitivity  of  the  esti¬ 
mator  due  to  outliers  and  deviations  of  random  values,  x,  from 
the  assumed  distribution. 

The  optimal  estimate  6  of  the  parameter  §  is  the  solution 
of 

I  P ,0,£)  3  min  (2. 28) 
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If  p(x,0,<t>)  is  convex  and  the  derivative 

<l>(x,9,(t)  =  jQ  p(x,0,<j>)  +  p(x,0,4>) 

is  continuous  and  bounded,  Eq.  (2.28)  reduces  to 
Z  Ij/(xi  ,0  ,i)  =  o, 

and  in  the  case  of  a  location  parameter  to 

A 

z  i|>(x^-6  ,$)  ■  o 


(2.29) 


(2.30) 


2.4.5  Robustness  of  Estimators  for  Finite  Density  Functions 

Finite  density  functions  have  nonzero  values  only  for  random 

values  x  <  x  <  x„„  .  Therefore,  maximum  likelihood  functions 
ram  —  -  max 

of  estimators  for  random  variables  with  finite  density  functions 


have  the  form 

i 

zero 

for 

•x  <  x  • 

1 

nun 

o(x,e)  =  ■ 

l  (x  ,  8  ) 

for 

xmin  1  x  1  xmax 

(2.31) 

zero 

for 

x  >  xmax 

Hence,  maximum  likelihood  estimators  for  finite  random  variables 
have  natural  confidence  boundaries,  and  values  outside  the  confi¬ 
dence  boundaries  are  considered  outliers  and  rejected  for  the 
estimation  problem.  The  robustness  of  maximum  likelihood  esti¬ 
mators  for  finite  random  variables  is  inversely  proportional 
to  the  difference  between  the  confidence  boundaries. 

2.5  APPLICATION  TO  DYNAMIC  SYSTEMS 

The  concepts  presented  above  for  static  systems  may  be 
extended  for  dynamic  systems.  This  section  shows  the  difficulties 
in  this  oxtension  and  how  some  of  the  problems  are  resolved. 
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Consider  a  dynamic  system  in  discrete  form  with  state  x, 
controls  u,  outputs  y,  and  parameters  9. 

xk+^  =  f (xk,uk, 6)  +  wk  k=0 , 1 , 2 . . .N- 1 

yk  =  h<-xk,uk>6^  +  k  =  l ,2 , . . .  ,N 

wk  and  are  process  and  measurement  noise  sources.  The 

standard  procedures  for  the  estimation  of  parameters  0  are 
based  on  the  assumption  that  wk  and  vk  are  white  with  Gaussian 
densities. 

In  general,  this  problem  can  be  solved  as  an  estimation 
problem  along  the  time-coordinate  in  n+l-dimensional  space.  The 
n-dimensional  state  x  of  the  dynamic  system  is  known  with  some 
uncertainty  nQ  at  time  t=tQ.  Using  the  m-dimensional  parameter 
state  of  the  sytem  the  state  is  predicted  for  the  time  t=ti=t0+At0« 
Using  0,  the  uncertainty  H0  is  mapped  into  the  n-dimensional 
plane  at  t=t^,  no+  n^.  Because  of  parameter  errors  and  disturbances 
on  the  system,  the  uncertainty  increases  to  IT^  =  ninIp‘  A  measure 
y j  is  taken  and  an  innovation  =  y1-h1(x1,0)  is  formed. 

The  problem  we  face  now  is  to  decide  where  the  confidence  region 
lies.  It  is  better  to  expand  the  confidence  region  about  the 
predicted  state  or  the  measurement,  or  should  we  expand  it  about 
the  updated  state,  formed  using  the  assumption  that  the  measurement 
as  well  as  the  predicted  state  are  included  in  the  assumed  proba¬ 
bility  space?  As  long  as  the  confidence  region  nc  of  the  measure- 
ment  is  included  in  the  state  uncertainty,  n  c II  ,  it  cannot 

be  detected  whether  a  particular  measurement  falls  within  its 

? 

tail-statistics  if  it  is  also  included  in  IT  . 

We  shall  first  look  at  the  case  where  the  process  noise  n^clT0. 

2  c 

Then,  after  a  sufficient  number  of  measurements  IT  c  n  ,  and 
the  confidence  region  can  be  expanded  about  the  predicted  state  x 
and  confidence  mapping  can  be  applied  to  the  innovation  sequence,  v 
(Hampel,  1971;  Papantoni-Kazakos/Gray ,  1978). 


(2.32) 

(2.33) 


20 


2.6  NO  PROCESS  NOISE 


When  there  is  no  process  noise,  estimators  which  are  robust 
for  deviations  from  the  nominal  measurement  noise  probability  den¬ 
sity  may  be  designed.  Following  along  the  lines  of  the  static  es¬ 
timation,  a  cost  functional  is  defined  in  the  following  manner, 

N  T  -1 

J  *  k^l  ^  R  ^k^k  +  l02lR(ek^  (2.341 


The  weighting  R  is  a  function  of  the  error  itself.  If  the 
errors  in  the  measurements  are  known  to  be  uncorrelated,  the 
matrix  R  is  diagonal.  The  functional  forms  for  the  diagonal 
terms  of  R  must  be  such  that  the  dimensions  of  the  particular 
measurement  do  not  affect  the  confidence  bound.  Therefore 


ek(i) 

5k(i)  -  ~ -  (2.35) 

k  <4 

Rii(ek)  -  o-p(5k(i))  (2-36) 


where  ek(i)  is  the  ith  component  of  the  error  vector  at  time  k. 
The  function  p(*)  is  the  same  for  each  measurement.  To  estimate 
a^,  we  differentiate  Eq.  (4.3)  with  respect  to  and  set  the 

result  to  zero  (assuming  p(5k(i)1  is  differentiable  everywhere). 


3I  N 

_LL  .  j 

3ai  k-1 


ej(i> 


T 


[a?p(Ck(i))]2  ta-PUk(i))] 


2  3p(Ck(i)) 


3ot: 


2aip  (Sk(i)) 


=  0 


(2.37) 
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_ A 


and 


3p(5k(i))  3o(Ck(i)) 

3o^  35^ (i) 


Mi) 


3p (5k(i)) 

*  - , . . — 

35kU) 


Therefore 


with 


Ck(i)  3p  CCk (i) ) 
Wk(i)  =  2  '  p Uk(i) )  35k(i) 


(2.38) 


(2.38) 


If  p  is  a  mild  function  of  the  variable  ?k(i),  the  above 
equation  may  be  simplified  to 


2 


a 


i 


i  ?  4<-» 


(2.40) 


The  estimation  of  parameters  based  on  the  likelihood  function  of 
Eq.  (2.34)  must  therefore  be  done  in  two  steps. 

(1)  Select  an  and  perform  a  Newton-Raphson  optimization 

to  estimate  the  system  parameters.  In  this  step,  deriv¬ 
atives  of  p(£k(i))  with  respect  to  system  parameters 
may  be  ignored. 
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(2)  Use  the  above  equations  to  estimate  a^.  Repeat 
the  procedure  till  convergence  occurs. 

This  technique  may  be  modified  for  the  case  of  gaussian  pro¬ 
cess  noise  and  nongaussian  measurement  noise. 

2.7  PROCESS  NOISE 

To  develop  procedures  for  Non-Gaussian  process  noise,  we 
assume  that  its  distribution  is  generally  normal  with  some 
contamination  from  another  normally  distributed  noise  of  higher 
standard  deviation.  Suppose  at  any  point  in  time,  the  process 
noise  is  a  realization  of  the  higher  variance  random  variable. 

The  state  at  the  next  point  will  be  highly  perturbed  from  its 
expected  value.  This  perturbation  may  reduce  in  the  following 
step,  if  the  system  is  stable.  Nevertheless,  its  effect  will 
be  felt  in  a  number  of  subsequent  steps.  This  is  very  different 
from  the  case  of  the  non  Gaussian  measurement  noise  where  the 
effect  of  each  deviation  is  felt  only  in  the  particular  step. 

The  previous  discussion  points  to  two  aspects  of  problems 
with  nongaussian  noise.  The  nongaussian  effect  (particularly 
the  case  when  one  basic  distribution  is  contaminated  by  another 
distribution  with  higher  variance)  at  one  sample  point  lasts 
over  many  future  points.  This  complicates  the  problem.  However, 
because  of  this  reason,  it  appears  that  more  information  is 
available  to  isolate  nongaussian  behavior. 

The  critical  problem  in  the  development  of  an  algorithm 
is  to  ensure  that  the  nongaussian  effects  at  one  point  do  not 
influence  the  estimates  of  noise  sources  at  other  points.  The 
most  direct  procedure  to  achieve  this  objective  is  to  identify 
the  system  parameters  as  well  as  the  process  noise  variable  w 
without  making  any  a  priori  assumption  about  w.  In  other  words, 
w  is  assumed  to  be  a  completely  unknown  input  signal  with  unspecified 
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characteristics.  With  no  assumptions  on  w,  it  is  possible  to 
find  its  estimate  only  when  the  number  of  measurements  exceeds 
the  number  of  process  noise  sources. 

Let  w^  (k  *  1,2,3,...N)  be  the  estimate  of  w  based  on  no 
a  priori  information.  Using  this  estimate,  it  is  possible  to 
select  the  values  of  k  for  which  w^  may  be  assumed  to  come 
from  a  Gaussian  distribution.  A  Gaussian  covariance  may  be  speci¬ 
fied  for  process  noise  at  those  points  while  the  other  points  are 
assumed  completely  unknown.  This  procedure  is  repeated  till  con¬ 
vergence  occurs. 

The  difficulty  with  using  this  procedure  is  its  integration 
with  the  parameter  estimation  algorithm.  It  appears  that  the 
procedure  for  specifying  w^  has  to  be  reinitialized  following 
each  parameter  iteration.  The  computation  time  requirements  may 
be  unacceptable.  It  may  be  sufficient,  however,  to  update  w, 
once  after  several  iterations,. 
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2.8  EXAMPLE  OF  ROBUST  LIKELIHOOD  ESTIMATION 


Many  problems  have  been  solved  by  the  use  of  robust  maximum 
likelihood  methods.  This  section  presents  an  example  to  demonstrate 
the  improvement  in  estimation  accuracy  which  may  be  achieved 
by  the  application  of  these  methods.  Monte  Carlo  methods  are 
used  to  demonstrate  a  quantitative  reduction  in  variance. 

Consider  the  following  nonlinear  system. 


*1  ■  911  X1  *  9 12  X1  *  ®13  x2 


*  ®14  x2  *  U1 

x2  e  e21  X1  +  e22  xl2  +  623  x2 


*  924  x2  *  u2 


with  parameters 


e  = 


-1.0  .01  .2  -.02 

-.15  .03  -1,0  .015 


(2.42) 


(2.43) 


and  forcing  function 
'0 


u  = 


t  e 


- .  It 


(2.44) 


was  observed  by  measurements 
y  =  x  +  n. 


The  errors  in  the  measurements  are  a  mixture  of  normally  distributed 
random  variables 

/  r  a  p  a  ■»  \ 

(2.46) 


nl  *  N  °»  0 


.05 


:.i) 


and 


.5  0 

n2  *  N  l°»  I  0  10 


,1) 


(2.47) 
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90%  of  the  time 


(2.48) 


n 


nl 

10%  of  the  time 


Figure  2.2  shows  maximum  likelihood  and  robust  likelihood  esti¬ 
mates  of  the  parameters  8^  and  0 respectively,  for  differ¬ 
ent  n1e^,  the  space  of  all  random  variables,  defined  by  Eq. 
(2.48).  The  variances  of  the  parameter  estimates  and  the  output 
errors  are 


MAXIMUM 

LIKELIHOOD 

ROBUST 

LIKELIHOOD 

1 

MAXIMUM  LIKELIHOOD 
ROBUST  LIKELIHOOD 

.095 

.033 

2.9 

°023 

.201 

.050 

3.7 

.02758 

.02804 

.98 

°*2 

9.205 

9.205 

.99 

[NUMBER  OF  DATA  POINTS  N=50] 


The  robust  likelihood  estimation  gave  parameter  estimates  for 
this  example  that  are  2.9  and  3.7  times  better  than  the  parameter 
estimates  of  the  maximum  likelihood  estimator  with  an  average 
increase  in  output  error  of  less  than  2%  and  a  maximum  increase 
of  output  error  of  less  than  5%. 

When  the  number  of  parameters  is  large  or  if  some  parameters 
are  marginally  identifiable,  further  improvements  in  estimation 
accuracy  may  be  achieved  by  the  use  of  robust  procedures. 
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2.9  CONCLUSIONS 


Accuracy  of  estimates  resulting  from  experimental  data 
may  be  significantly  enhanced  by  applying  robust  techniques. 

These  techniques  require  modification  to  the  least  squares  type 
of  performance  index  in  general  and  to  the  maximum  likelihood 
method  in  particular.  Huber's  work  [15-17]  provides  good  background 
for  the  development  of  robust  methods  for  dynamic  systems. 

Though  this  section  has  been  concerned  with  parameter  estimation, 
similar  methods  apply  for  state  estimation  also.  Application 
of  these  methods  could  provide  significant  improvement  in  dynamic 
state  estimation  when  Kalman  filters  are  used. 


III.  PARAMETER  ESTIMATION  WITH  NON- WHITE  ERRORS 


3.1  INTRODUCTION 

Measurement  errors  are  generally  correlated  due  to  instrument 
dynamics,  finite  moments  of  inertia,  dynamics  and  environmental 
influences.  The  environmental  influences  disturb  the  system 
and  the  measurement  instruments  at  the  same  time  or  with  some 
time-delay  and  hence  correlate  the  system  noise  and  the  measurement 
error . 

The  nonwhiteness  and  the  cross-correlation  between  system 
and  measurement  errors  correlate  the  innovation  process  in  the 
Kalman  filter  unless  these  effects  are  compensated  for.  It 
is  shown  that  estimates  will  have  increased  covariances  but 
will  not  be  biased.  Techniques  for  including  the  effects  of 
known  measurement  error  correlations  in  state  estimation  problems 
have  been  considered  by  several  previous  authors  (Kalman,  1963; 
Henrikson,  1968;  Jazwinski ,  1970).  This  chapter  describes  state 
as  well  as  parameter  estimation  methods  for  non-white  noise. 

3.2  PROBLEM  FORMULATION 

In  non-white  noise  problems,  the  spectrum  of  the  process 
and  measurement  noise  sources  is  often  not  known.  Without  a  priori 
knowledge  of  the  noise  power  spectrum,  one  of  the  following 
is  required: 

(1)  Estimate  the  correlation  and  adapt  the  state  and  param¬ 
eter  estimator  to  the  determined  correlation. 

(2)  Sacrifice  some  of  the  sensitivity  of  the  estimator 

to  make  it  insensitive  (robust)  against  a  set  of  pos¬ 
sible  correlations  of  the  worst  case  correlation. 

In  the  first  approach,  an  additional  spectrum  estimation  step 
is  required.  Once  the  spectrum  is  known,  procedures  for  the 
known  spectrum  can  be  used  for  state  and  parameter  estimator. 
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The  second  approach  leads  to  a  worst  case  estimator.  A 
certain  efficiency  is  lost  if  the  noise  is  white  but  the 
estimators  are  robust  with  non-white  noise,  particularly  when 
the  noise  has  a  spectrum  similar  to  that  of  the  signal. 


3.3  ESTIMATORS  FOR  KNOWN  CORRELATIONS  BETWEEN  ERRORS 


Let  the  discrete  linear  system  (A  non-linear  system  can 
be  approximated  by  piece-wise  linear  systems)  be 


ck*i  xk  *  rk  wk 


with  measurement  outputs 


=  Hk  xk  +  vk 


0,1,...  ,N- 1 , 


,  k  =  1 , 2  , .  .  .  ,N  , 


(3.1) 


(3.2) 


where 


xv  e  Rn  is  the  system  state, 


yk  e  Rm  is  the  measurement  of  the  output  H^x^ 
w.  e  Rp  noise  sequence  of  the  system 


v,  e  Rm  noise  sequence  of  the  measurements 


xk  =  E[xk] 

E[wk3  =  E[vk+1]  = 

Define  the  correlation  matrices 


Rxx(k,j} 

— 

E[(xk-xk)(xj- 

= 

E[wkwjT] » 

RvvO'.j) 

= 

E[vkv.T], 

R„y(k,j) 

= 

E[wkVjT], 

p  k 

Pk+1 

A 

E[(xk+l-xkk+i 

k  =  0,1,. . .  ,N- 1 


77  kT, 


(3.3) 


:.k 
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where 


-  Et*k.iiyki- 


For  the  white  noise  ease  with  a  priori  known  covariance, 


E(“k“jT] 

=  Rww^k*k^kj, 

E[vkv.T] 

-  Rvv^>k^kj, 

E!"kvjT' 

-  0, 

the  minimum  variance  filter  is  between  measurements. 

Ak 

xk+l 

,  "k 
*k  xk 

(3.4) 

pR  = 

Fk+1 

E[(<t>k(xk"xk^  +  rkwk^*k(wk'xk)  +  rkwl^  ' 

!Ykl 

pk  = 

Fk+1 

k  T  k  T 

Wk +  rkC  )rk  ’ 

(3.5) 

and  at 

observations 

xk  = 

xk 

xk_1  +  Kk(^k  •  Hk 

(3.6) 

k 

Pk  = 

"IT1  -  Kk  Hk  prx  ’ 

(3.7) 

where 

Kk  - 

pk"luT  ru  Pk”^H  Y  +  R  kl”^ 

Pk  H  klHkPk  Hk  vv  J 

(3.8) 

is  the 

Kalman  gain. 

3.3.1 

Correlated  State 

:  and  Measurement  Noise  with  A  Priori 

Known  Covariance 

T 

For  the  a  priori  known  covariance,  Efw^Vj  ]  =  Rwv^k,k^5kj  ^  °» 

between  state  and  measurement  noise  an  optimal  estimator  can  be 
assigned  by  the  method  of  Kalman  (1963) ,  and  Jazwinski  (1970) . 
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Let  U,  =  {Uj,...,uk>  be  an  orthonormal  basis  for  the  measurement 
Y^.s.t.  E[u^ukT]  =  I  5 .  Since  the  best  estimate  xk+1  of  xk+1 

is  the  orthogonal  projection  of  xk+1  into  Y  ,  it  can  be  described 

by  *k  k  t 

xk+l  =  ?=1  E^xk+lui  lui  *  (3‘8) 

and  with  (3.1) 

k 

xk+l  =  \=1  E[(*k‘xk  +  rkwk)uiT]ui  +  Etxv+lukT]uk-  (3*9) 

Since  wk  is  not  sequentially  correlated,  it  is  independent  of 
Yk_  j ,  and 


*k 


k+1 


(3.10) 


$k  xk  1  +  E ^xk+lukT^uk" 

«k- 1 

Because  the  innovation  vk  =  yk  -  Hkxk  is  orthogonal  to  Yk_1 
but  included  in  Y, 


E[xi.iukT'uk  *  K*kt>rk-HkiS'1i  - 


f3.ll) 


The  error  in  the  estimate, x£+1  ^  xk+1  -  xk+1, is  independent 

k  T 

of  the  measurement  y^,  hence  ]  =  0*  This  gives, 


K\  -  t*kpk'V  *  rkRwv jtVk'V  *  w1  (3-I2) 


W1 


ith 


Pk*l  -  EtXk+l  Xk-1T1  =  *A~W  +  FkRwwrkT 


(3.13) 


“wiTV  +  vVj- 


The  filter  for  correlated  state  and  measurement  noise  is  then  , 
Measurement  update: 


'k 

xk 

3  'i'1  +  Kk^k  -  Vk*1) 

(3.14) 

pkk 

-  pjr1  -  KkHkpk_1 

(3.15) 

Kk 

3  pk'VfVk'V  +  v-1 

(3.16) 
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Time  update: 


xk+l 

Pk 

Fk+1 


♦  rkRwv1(tHkpk'V  *  ■  Vk"1)  (3-17) 


*kPkV  *  rkRwwkrkT  -  rkR™klHkPk'lHkT  *  W1  C5-18) 


rr^  rT'  rT^  T' 

r  r  -  <j>,  k,  r  lr  -  r  R  k.  V  . 
wv  k  ^k  k  wv  k  k  wv  k  yk 


3.3.2  Sequentially  Correlated  Measurement  Noise  with  A  Priori 
Known  Covariance 


Let  the  measurement  noise  be  correlated  through  the  Markov 
sequence 


Vk+1  =  Vk  +  uk 


(3.19) 


with  driving  noise  u.  ^  N(0,R,tll  )  uncorrelated  to  the  state 

K  UUi 

noise  w^ ,  Et^w^1]  =  0.  Kalman  (1960)  solves  this  problem  by 
augmenting  the  Markov  sequence  (3.19)  to  the  state  equation  (3.1): 


with  noiseless  measurements 

ya  *  Ha  xa  (3.21) 

y  k  M  k  x  k* 


In  the  formulation  (3.20),  (3.21),  the  error  covariance  of  the 
estimate  becomes  singular.  To  overcome  this  problem,  Bryson/ 
Henrikson  (1968)  and  Bryson/Ho  (1969)  used  the  difference  of 
successive  measurements,  which  has  additive  white  noise. 
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(3.22) 


with 


and 


R 


uu, 


’k  =  ^k-l^k-l 


?k  =  ^k- l^k- 1  +  u*k- 1 


Hk- 1  =  Hk  *k-l  '  ^k- 1  Hk- 1 


Uk-1  =  Hk  rk-i  Wk-1  +  Vi 


(3.23) 


k- 1 

The  system  is  now 


-  EK-l“*k-l  1  -  V,lH,r,1\T  *  Ruu 


xk+l  =  ^k  Xk  +  rk  Wk 


k-1 


(3.24) 


ck+l  "  Hk  xk  +  Hk+Irk  wk  +  uk  »  k  -  1 


H*k  =  Hk+l*k  "  Vk  Hk 
T 

with  vn  -v-  N(0,Rvv  )  and  E[x0vn  ]  =  0 ,  we  get  from  the  augmented 


vovo 


equations  (2.20,  2.21) 


-  *„  *  p°  hitihipohit  * 

0  (3.25) 

PJ  •  Po  -  PoH1TIH1PoH1T  *  "vv/Vo. 

The  filter  for  system  (2.34)  is  found  by  the  method  of  Section  (3.3.1), 

*i:\  ■  (3-26) 

•K  PW  *  RuuI  1[5k*l  -  $ 


pk+1  _  Dk.  T  r  o  r  T 
Pk+1  ~  *kPk^k  kRww^  k 


[*kpkkHkT 

+  r ,  r 
k  wu 

RwwVl 
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(3.27) 

*r>  h„*T  D*  i‘lru*D  ^  T  + 


with 


Rwu*  =  RwwkrkTHk+lT 


C  3 . 2  8) 


Ruu  =  Hk+irkRwwkrkTHk+lT  +  Ruuk 


The  solution  for  time  correlated  measurement  noise  of  continuous 
systems  is  shown  by  Bryson/Ho  (1969)  p.  405ff,  Mehra/Bryson  (1968), 
Bryson/Johansen  (1965) 


3.4  CORRELATED  ERRORS  WITH  UNKNOWN  COVARIANCE 


3.4.1  Impact  of  Correlations 


Comparing  the  optimal  estimator  for  white  noise  (3.3  -  3.7) 
with  the  estimators  for  correlated  measurement  and  state  noise 
(3.14  -  3.18)  and  sequentially  correlated  measurement  noise,  we 
observe  the  following: 

(1)  Contrary  to  the  white  noise  case,  the  time  update  of 
the  optimal  filter  for  the  correlated  noise  requires 
feedback  of  the  innovation  sequence,  y  -  Hx. 


(2)  If  the  measurement  noise  couples  with  the  state  equations, 
the  Kalman  gain  for  the  optimal  filter  is  different  than 
the  gain  for  uncorrelated  errors. 

To  further  investigate  the  impact  of  correlated  noise  on  the 
estimation  problem,  the  worst  type  of  correlation  is  determined 
in  Section  3.4.2,  and  its  effect  on  a  parameter  estimation  that 
assumes  uncorrelated  noise  is  investigated. 


3.4.2  Worst  Type  Correlation  Function 


We  first  consider  a  system  without  process  noise,  but  with 
additive  measurement  noise.  Let  the  system  be  defined  by 
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x  =  £  (x,  0 ,  u,  t) 


(3.29) 


y  =  h  (x,  9,  u,  t)  +  v, 


(3.30) 


where  x:  state  space 
9 :  parameters 

u:  control 

v:  additive  measurement  noise  with 

covariance  R. 

To  obtain  the  maximum  likelihood  estimate,  we  minimize 

J  -  /  (y-h)TE‘1(y-h)dt,  tmin  <_  t  <  tmax  (3.31) 

The  worst  case  noise  maximizes  estimation  error  and  hence 
minimizes  the  information  matrix.  Since  functional  char¬ 
acteristic  of  non-white  noise  is  best  described  by  its 
frequency  distribution,  we  transform  our  estimation  pro¬ 
blem  into  the  frequency  domain.  Assuming  all  functions  have 
Fourier  transforms,  the  cost  function  in  the  frequency  domain 
is, 

o°  ^ 

Jf  -  /  (y-h)  (y-h)  du  .  (3.32) 


We  now  find  a  frequency  distribution  of  the  noise  spectrum  such 
that  the  information  matrix,  M  ,  is  a  minimum  and  the  co- 
variance  of  the  noise  is  a  constant. 
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min 

M 

0) 

max 

/ 

Svv 

(») 

w 

max 

the 

constraint 

M  = 

“max  ( 

“max  \ 

SvvM  dw  =  constant, 


(3.34) 


-1 


3  6  VV 


M  +  A  svy 


|  du  ■ 


const.  (3.35) 


A  good  measure  of  matrix  M  is  its  trace, 


£r  M  =  /  tr(  ||  S^W1  ||  ♦  *  Svv(„))  d»  -  c 


(3.36) 


tr  M  is  a  minimum  if 


3trM  _ 


TST,,  '  1  vuVV  3  0  3  0  “vv 

VV  W 


-l3h  3h  c  -*)*  +  A*}  do,  =  0 


(3.37) 


For  the  integral  to  be  zero,  the  expression  under  the  integral 
has  to  be  zero.  We  get,  therefore, 


o  e  *  c*  ^  f  \  _  3h  (  U) )  3h  (  U) )  _  .  u  f  ^ 

s,r„(“0  A  (oi)  =  Jq  -  Je -  :  H(u)  , 


VV  VV 

dim  0  >  dim  v  > 


(3.38) 


/  Svv(w)  dw  =  const  =  S. 
0) 


(3.39) 


Since  both  Svv(<d)  and  H(ui)  are  symmetric,  A  also  has  to  be 
symmetric,  and  we  can  decompose  the  equation 

(Svv^>  CS„v»V  -  H  -  H*  H**; 


(3.40) 


hence , 


Svv(»)  =  A'4  H*5 


(3.41) 
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Using  the  constraint 


A'*5  /  H*(u)  du,  =  S 

U) 


(3.42) 


the  language  multiplier  is 

A*5  =  {;  H^w)  dui)  S'1  .  (3.43) 

(ii 

Then  the  spectral  noise  distribution  for  the  extremum  of  the 
information  matrix  is 

.  t  -1  . 

S  (u)  =  [{/  H^aiidulS'1]  H*(w)  (3.44) 

vv  OJ 

S  fa>)  =  Sf/H^ftOduJ'V^O))  .  (3.45) 

vv  01 

Writing  the  equation  in  the  form 

( /S  (oi)dui}~  XS  (oi)  =  {/^((oJduJ'^Cu)  (3.46) 

U)  vv  vv  u> 

shows  that  the  information  matrix  M  is  a  minimum  if  the  dis¬ 
tribution  of  Svv(ui)  is  the  same  as  the  distribution  of  H^ui)  . 

For  an  estimator  of  this  type,  the  estimates  follow  the  weighted 
noise  in  a  maximum  fasion,  as  observed  in  many  estimation  prob¬ 
lems.  The  residual  errors  of  the  estimates  will  then  be  mini¬ 
mized  and  give  the  illusion  of  a  good  parameter  estimate;  even 
so,  the  parameters  just  follow  the  noise.  Clearly,  estimation 
errors  may  be  significantly  increased  if  the  noise  is  the  same 
frequency  range  as  the  signal. 

3.4.3  Effect  of  Worst  Case  Noise  Distribution  on  Estimation  Error 


The  information  matrix  is 


tr 


^max  9h  Q 
“min  W  vv 


1  9h 


dui 


(3.47) 


40 


and  the  trace  of  the  information  matrix  is 


*_  M  _  /-max  „w3h  -  -l3h,  , 

tr  Mf  -  /  tr{—  Svy 

“min 


* 

_  ,max  ^rSh  3h  c  -L  , 
-  /  tr{—  —  b  .  }d 


u  . 

min 


30  36  VV 


(3.48) 


tr  Mf  =  /max  trfHMS^'1^)}  dw 

0)  . 
mm 


The  worst  case  noise  of  the  optimal  filter  is 


Svv(uO  =  S  (H  **)  "  lHJj (to) 

S^^o))  =  (H^U))'1!!  V1  d 


tr  M£  =  /max  t r { H*5 ( a) ) H  V1}  du> 

“min 

tr  =  tr  {{/H^CuO  dcH/H^uOdu^S'1} 


(3.49) 


For  the  filter  assumption  that  the  noise  is  white,  SVV(«)=SS 
constant,  the  trace  of  the  information  matrix  is 


tr  Mf  =  /max  tr{H  (u)  S'1}  doi 
“min 


=  /max  tr(Hlj(a))His*(u))S'1}  du 
“min 

tr  M  a  =  tr{  {/max  (H^uOH*5*^)  }  du>  S'1}  . 


(3.50) 


Since  the  product  of  integrals  is  larger  or  equal  to  the  integral 
of  a  product, 

tf  Mf  >  tr  .  (3. 51) 
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Therefore,  the  error  in  the  parameter  estimate  increases  if 
white  noise  is  assumed  for  an  estimation  problem  with  non¬ 
white  errors.  A  good  measure  for  the  degradation  of  the 
maximum  likelihood  estimator  if  worst  type  noise  is  present 
and  white  noise  is  assumed  is, 

i  tr  =  tr  {  [H'^H'*8*]'1/  H(«)  dw}i  .  (3.52) 

p  r  r  0)  p 

where  p  is  the  number  of  parameters.  The  error  covariance 
of  the  parameter  estimate  6  increases,  therefore,  by  the  noise 
distribution  factor 


tr{  [H'^H 


‘X/H(w)d} 

CD 


3.4.4  Example:  First  Order  System 


(3.53) 


Let  the  parameter  sensitivity  be 

1 


3h,  . 

ae(w) 


1  + j  (0 
0 


with  the  noise  spectrum 


S(u>)  = 


C 

C, 


Vl+cD: 


for  0  <  CD  <  CD, 


max 


for  <d  >  id 


max 


for  (d  < 


id. 


w_ 


for 


w 


max 

<  CD  <  CD_ 


max 


max 


(3.54) 


(3.55) 


and  the  covariance 


CD_ 

/ 

0 


max 


S(cd)  d<D 


3h 


(3.56) 


■1  3h 


The  trace  of  information  matrix  is,  tr  M  *  /  tr  (|^  S^U)  jg  }da)  . 


In  the  white  noise  case  the  spectral  distribution  is  constant,  S(w) 
The  trace  of  the  information  matrix  is,  therefore, 


R 


max 


tr  M  =  / 

W  CD 


max  1 


mm 


1+cD 2  R 


“max  j  _  “max  _ _ _ 

-  do)  =  -  arctan  u> 


max 


(3.58) 


(3. 
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For  a  mixture  of  white  and  non-white  noise: 


for  0  <  o)  <  (i) 


w. 


S(u>)  = 


with  C 


C/l+w 2 
wmax 

/l+TT? 


for  w 

R 


max 


<  U)  <  tu 

w  _  w 

max  max 


(3.59) 


a)  +  /l+oi  Arshfco  /l+oi  -  w  /1+oj  1 

wmav  1  max  w  w  maxJ 

max  max 


For  the  white  noise  section, 


1  ww  1  1 

c  'w  dw  -  c  arctan  \  ; 


(3.60) 


non-white  section. 


1  “max  d(il  «  1  /max  — 1 —  dui  =  i  Arsh(w) 

C1  «w  1+“  C1  «w  /TO  C1 


U) 


max 


w. 


w 


"w+/1+“w2  Arsh  “  Arsh  «* 

tr  Mnw  =  - r -  [arctan  u>w  +  with 

/1  +  U)  * 
w 


(3. 


U)  =0)  /1  +  U)  2  -  /1  +  U)  z 

max  w  w  max 


The  error  covariance  of  the  estimate  for  the  non-white  case 

relative  to  the  white  noise  case  is  then  trMnw "*/trMw "  • 

Figure  3.1  shows  this  covariance  ratio  for  different  degrees 

of  non-whiteness,  indicated  by  the  ratio  ojw  /a)mav-.  In 

w  max 

max 

this  example,  the  error  covariance  for  the  worst  type  of  noise 
spectrum  is  64%  larger  than  the  covariance  for  white  noise. 

For  higher  order  systems  non-white  noise  will  degrade  parameter 
estimates  even  more. 

Non-white  noise  has  therefore  a  significant  influence  on 
parameter  estimates  and  should  be  considered  in  estimation  procedures. 
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Figure  3.1 

DEPENDENCE  OF  ERROR  COVARIANCE  OF  ESTIMATE  ON  WHITENESS  OF  NOISE 


3.5  ESTIMATION  OF  CORRELATION 


If  large  data  sets  are  available  the  distribution  function 
and/or  density  function  may  be  estimated.  In  the  following, 
a  method  for  estimating  the  distribution  function  is  outlined; 
the  density  function  can  be  estimated  in  a  similar  way. 

A 

The  estimate  F(x)  of  a  distribution  function  F(x)  may 
be  described  by  a  parametric  form, 

F(x)  =  f  (ot  ,<(>  (x) ) ,  (3.62) 

where  x  is  a  vector  of  unknown  coefficients  and  <Hx)  is 
a  set  of  independent  functions  of  x,  e.g.  moment  functions. 

The  coefficients  a  may  then  be  estimated  by  minimizing 
a  suitable  function  of  the  error  f(<*,4>(x))-F(x).  In  the  least 

A 

square  sense,  the  estimate  of  a  of  a  is, 

a  =  arg  min  E  [f  (x,  <J>( x)  )-F(x) ]  2  (3.63) 

a 

If  f(x,  (x))  is  a  linear  function  of  a, 

f  (a  ,<t>  (x)  =aT<J>(x)  ,  (3.64) 

the  least  squares  estimate  (LSE)  of  a  becomes 

a  *  [E(<p(x)<l>T(x)  }]~1Efd>(x)F(x)  >.  (3.65) 

For  numerical  evaluation,  it  suffices  generally  to  replace  the 
estimate  by  an  integral.  The  estimate  of  a  becomes  then, 

a  *  [/  <j»(x)4)T(x)dx] /  <j>  (x)F(x)dx, 

R  R 


(3.66) 


where  ndx  =  dx^ ,  and  R  is  the  range  of  x  where  the  estimate 
is  desired.  Kashyap/Blaydon  (1968)  developed  a  gradient  algorithm 
which  sequentially  updates  estimates  a  when  new  data  are  avail¬ 
able. 

For  the  estimation  of  mixtures  of  normal  or  exponential 
density  functions,  it  is  numerically  easier  to  estimate  the 
logarithm  of  the  density  function,  by  either  maximizing  a  regres¬ 
sion  criteria, 

*  E  [In  f(a,x)]  =  /f(x)  In  f(a,x)dx, 

(3.67) 

/f(a,x)  dx  =  1 

or,  equivalently,  minimizing  an  error  or  information  criteria 
(Young,  Coraluppi,  1970), 

j  -  i  [In  — ]  =  /  f  (x)ln  —  dx  .  (3.68) 

f(a,x)  f (a ,x) 

The  estimated  density  function  may  now  be  used  to  design  a  maximum 
likelihood  estimator.  The  sequential  correlations  of  the  residuals 
of  this  estimator  can  then  be  computed  and  adaptively  or  iteratively 
included  in  the  estimator. 


3.6  BOUNDING  LINEAR  ESTIMATOR 


In  many  measurement  systems,  auto  correlations  of 
measurement  errors  are  known,  but  little  information  is 
available  about  cross  correlations  between  error  sources, 
correlations  between  system  and  measurement  errors,  and 
sequential  correlation  (non-white  errors).  Hence,  an  es¬ 
timator  that  includes  a  priori  knowledge  of  cross  correlations 
and  sequential  correlations  cannot  be  designed.  Estimators 
for  such  systems  can  be  designed  with  an  upper  bound  on 
estimation  errors. 


3.6.1  Bounding  Linear  Miminum  Variance  Estimation 

Let  us  assume  a  system, 

xk+l  =  *k  xk  +  Fk  wk’  k=0'1 . N_1 

with  output  measurements, 

"  Hk  xk  ^  j  k=l , 2 , . . . ,N , 

where  xk  e  Rn;  yk,  vk  e  Rm;  wk  e  Rp. 

A 

Then  a  bounding  estimator  provides  as  estimate  x  of  x,  such  that 
E[(xk'xk)(xk'xk)T]  i  PB  | k) . 


Combining  xQ,  w,  v  in  a  composite  state, 

xT  •  -  Tx  T  w  T  v  T  w  T  v  T1 


and  with  the  composite  measurement  vector, 
Z.T-  .  rz>T  T  T, 

^  •  L  *  i  t  *  2  9  *  *  *  9  i  J  9 


the  measurement  matrix  becomes. 
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H2^0^1 


H2*ir0 


I  0 


0  H2ri 


Hi :  = 


Hi^0 ’  ‘  *^i-l  0  Hi^2’  *  ‘^i-lrl  Hiri-1  1 


n  +  (p  +  m)i 

Then  the  causal  system  can  be  written 
zi  58  Hi  xi  • 

For  w,  v  random  zero  mean  variables  the  predicted  value  for 
the  composite  state  x^^  is 

E[x^]  *  xt  =  [Xq  ,6,...,0] 

with  positive  semi-definite  covariance 
*  E [ Cx±  -  xj) (xi  -  xi)T] 
of  order  n  +  (p  +  m)i. 

For  sequentially  and  colored  noise,  the  correlation  matrix  is, 


;xx(°'°> 

Rx  w 
*0  0 

R*ovi 

Rx  W 

0  1 

Vz 

Rx0vi 

lwoxo 

Rw  w 
w0  0 

Vi 

Rw0Wi 

R“0v2 

Vi 

‘vlx0 

Rv  w 

10 

Vl 

Rvl"l 

Rv  V  •  •  • 

vlv2 

Vi 

• 

• 

Vi-lX0 

Rwi-lw0 

Vi»i 

Vlwl 

Vlvi” 

Vi 

vix0 

Rv  w 
iw0 

Vl 

Vl 

V, 

Vi 
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with  no  sequential  correlation 


R»jwk =  Svk '  Rvjvk 


0  for  k  i  j  , 


the  covariance  matrix  becomes  block  diagonal  and  the 

estimator  can  be  written  in  a  Kalman  filter  type  sequential 
updating  form.  With  =  0 ,  no  correlation  between  the 
noise  sources,  R  becomes  diagonal*  The  estimator  becomes 
now  a  regular  Kalman  filter  for  Gaussian  white  noise.  If  R^ 
is  known,  the  linear  minimum  variance  estimate  x^  of  x^  is, 


x.  + 
1 


RlH1T[H1R1HiTr1t2i  -  HjXj] 


with  error  covariance, 


E[(x.  -  x.)(x*  -  i.)T]  -  Rj  -  R.H^tH.R.H.1]-1  HiRi 


For  an  estimator  of  the  form. 


4  “  xi  +  K.(z.  -  HjXj)  , 


the  error  covariance  of  the  estimate  is 


E  [  (Xj  -  ipCx.  -  -  [I  -  KjH.  ]  R.[I  -  K^.]1 


For  an  upper  estimate  (|.  >  R 


[I  -  K.H.]  R.  [I  -  K.H^1  £  [I  -  K.Hj]  Q.  [I  -  K-HJ7 


(pos.  semidefinite  if  m£n) .  With  the  minimum  variance  gain 


Ki  -  QjHjiHjQjHjr1  , 
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the  bounding  estimator  is 


*i  -  *i  *  QiHjrwjVhij.  -  Hjxji 


EtCxj  -  il)(.xi  -  ij)1]  <  Q4  -  Q1HiT[HiQiH1Tr1H1Qj 


Since  this  estimate  is  a  bounding  estimator  for  every  Q>R,  Q 
can  be  chosen  such  that 

Q  =  diag  Q  , 

and  a  sequentially  updating  bounding  estimator  can  be  designed. 
The  residuals  of  the  bounding  estimator  may  be  tested  for  non¬ 
white  and  correlated  noise.  This  information  can  then  be  used 
to  tighten  the  bounds  of  the  estimator. 


3.6.2  Bounding  Likelihood  Estimators 

In  a  similar  way,  for  the  bounding  linear  minimum  variance 
estimator  bounding  likelihood  estimates  can  be  determined  by 
deweighting  the  innovation  sequence  of  a  sequential  estimator. 
The  negative  log- likelihood  function  becomes  then 

NLLF  =  (y  -  Hx)T  Q_1(y  -  Hx)  +  In | R | , 

where  Q  =  diag  Q  >  R  is  an  upper  bound  of  the  error  covariance. 
This  change  can  be  incorporated  in  existing  state  and  parameter 
estimation  algorithms  in  order  to  get  a  bounding  estimator  for 
correlated  and  non-white  noise. 


3.7  CONCLUSIONS 

The  estimation  problem  for  correlated  as  well  as  non-white 
noise  was  analyzed.  Estimation  for  a  priori  known  correlations 
are  shown.  The  degradation  of  a  filter  due  to  non-whiteness 
of  the  noise  was  shown  for  a  linear  filter  that  assumes  white 
noise.  Estimators  that  give  unbiased  estimates  for  correlated 


50 


and  non-white  noise,  with  bounds  on  the  estimate  error  co- 
variance,  are  shown.  Further  research  is  necessary  to  deal 
with  the  problem  of  non-white  noise  due  to  non-linearities 
and  unmodeled  states. 
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IV.  CONCLUSIONS  AND  RECOMMENDATIONS 


4.1  CONCLUSIONS 

^Analysis  of  test  results  indicates  that  the  measurement 
and  process  noise  is  significantly  non-white  and  non-Gaussian. 

Some  analyses  indicate  that  10%  to  15%  of  the  data  points  may 
deviate  significantly  from  non-Gaussian  distribution.  In  addition, 
numerous  sources  lead  to  non-white  noise. 

These  errors  effect  both  the  accuracy  of  state  and  parameter 
estimates  as  well  as  the  estimation  of  accuracy  levels.  In 
this  report,  techniques  have  been  developed  to  treat  systems 
with  non-white  and  non-Gaussian  noise.  These  techniques  provide 
good  estimates  under  given  whiteness  and  Gaussianess  conditions. 

The  procedures  are  simple  and  can  be  easily  incorporated  in 

the  standard  maximum  likelihood  and  model  structure  determination 

methods. 

4 . 2  RECOMMENDATIONS 

Algorithms  used  in  system  identification  and  parameter 
estimation  should  be  modified  to  include  the  effects  of  non¬ 
white  and  non-Gaussian  noise.  Such  modifications  are  necessary 
to  significantly  improve  the  accuracy  with  which  parameters/states 
are  estimated. 
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APPENDIX  A 

MAXIMUM  LIKELIHOOD  ESTIMATION  IN  THE 
PRESENCE  OF  POISSON  NOISE 


! 

1 


I 
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INTRODUCTION 


In  many  problems  of  interest,  the  state  equations  are  governed 
by 


x  =  Fx  +  Gu  tQ  <  t  <  T  (1) 

and  the  measurements  are  arrival  times  based  on  Poisson  processes. 
The  probability  of  arrival  in  an  interval  t  to  t  +  At  depends 
on  a  linear  combination  of  the  state  variables 

p(t,  t+At)  *  Hx(t) At  (2) 

"•**••*•  •• .  . . 

Consider  a  scalar  arrival  sequence  described  by  the  above  process. 

Given  the  arrival  times  tx  <  t2  <  t3  ...  tN,  the  problem  is 

to  estimate  unknown  parameters  in  F,  G,  and  H. 

This  solution  will  be  useful  in  low-intensity  image  processing 
(where  photon  release  follows  Poisson  process)  and  medical  imaging 
with  radioactive  tracers. 


MAXIMUM  LIKELIHOOD  ESTIMATION 


The  probability  density  functin  for  arrival  times 
tj,  i*l,2,  ...N  follows  the  equation 


N 


i=l 


^(e|  t.)  *  f(tx,t2. . .tN)  =  n 


'*-,/> 


e'ui 


x(t)dt 


i-1 


The  negative  log  likelihood  function  (NLLF)  of  parameters 
given  the  arrival  times  is 

N 

J  «  -  ln[i?(e  |t,)  ]  -  z  ,  [y.-ln  y.] 

1  i=l  1  1 


(3) 

(4) 


(5) 
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The  first  gradient  of  the  NLLF  is 


The  second  gradient  is  obtained  as  follows 


N 

2  [ 
i=l 


30‘ 


(8) 


The  second  gradient  may  be  approximated  as  follows 
2  M  .  ,T 


3 

30 


4=  ?  A  (&)(&) 

*•  ]i 7  '  "  &  '  '  o D  * 


(9) 


For  a  single  count  at  t^,  one  parameter  may  be  estimated  by 
setting 


h 

1 1*  J  Hx(t)dt  *  1 


(10) 


The  second  gradient  matrix  has  a  complex  distribution.  Its 
expected  value  may  be  approximated  by 


ec^4)  ■ 


39 4 


N  /JUjWEiTjJ 


where  t^'s  are  such  that 


“i-r 


Hx(t)dt  -  1 


(11) 


(12) 


*1-1 
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COUNTS  OVER  A  SAMPLE  PERIOD 


The  arrival  times  are  difficult  to  use  in  estimation  when 
there  are  many  occurrences  over  the  time  period  of  observations. 
The  estimation  may  be  based  on  the  number  of  counts  over  a  set 
of  sample  periods.  Let  y^,  y2«..yjj  be  the  number  of  counts 
over  periot  t„,  t  +  t,...t  +N  t.  The  likelihood  function  of 
parameters  0  based  on  measurements  y^ ,  i=l,2...N  is 

(e  |y)  =  f(y|0)  =  n  (y/i  e  ui  )/(y, ,)  (1 

i=l  1 

to+iA 

. y»  *  ■/  -  -  •  Hx(t)dt  ri 

to+(i-l)A  U 

The  negative  log-likelihood  function  is 


J  *  -ln[i2?(9  |y)  ] 


N 

2  f-yi  lnCUi)  + y.  +  ln(y . ,) 
i=l 


The  first  and  the  second  gradients  are 


CIS) 


(16) 


(17) 


An  approximation  to  the  second  gradient  is 


(18) 


A  Newton-Raphson  step  may  be  taken  in  an  iterative  procedure 
as  follows 
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(19) 


This  procedure  may  be  continued  until  convergence  occurs. 
The  expected  value  of  the  second  gradient  matrix  is 


There  is  no  approximation  in  the  above  equation. 


(20) 
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